Making Documents Work: Challenges for Document Understanding
نویسنده
چکیده
In this paper I will try to explain the nature of document understanding in all of its dimensions. Therefore I will first describe the characteristics of data, knowledge, and information in order to describe their synergetic interweaving. After that I will try to structure the inherent complexity of sub-problems of document understanding which may not be solved serially, but rather are attributes of individual documents. Thus, this paper focuses on system engineering challenges. However, I will show some recent work done on the different topics and give some insights in the individual techniques we chose at DFKI.
منابع مشابه
روش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملChallenges in Moving from Documents to Information Web for Services
The service industry has traditionally relied on physical documents to communicate and manage their project and operational activities. As they adopt the object-centric view of Web2.0 technologies in their productivity tools, knowledge-based workers now have to work with interconnected object-centric view of information where they earlier had to deal with only documents created from wordprocess...
متن کاملAn Analysis of Ministry of Education’s Strategic Plans Based on Favorable Components of English Language Teaching Using Shannon’s Entropy
The present research aims to analyze the content of Ministry of Education’s strategic plans (the Fundamental Reform Document of Education, the Comprehensive National Scientific Plan and the National Curriculum Document) based on Shannon's entropy regarding the favorable components of teaching English. The contents of the Fundamental Reform Document of Education, the Comprehensive National Scien...
متن کاملAutomatic Document Topic Identification Using Hierarchical Ontology Extracted from Human Background Knowledge
The rapid growth in the number of documents available to end users from around the world has led to a greatly-increased need for machine understanding of their topics, as well as for automatic grouping of related documents. This constitutes one of the main current challenges in text mining. In this work, a novel technique is proposed, to automatically construct a background knowledge structure ...
متن کاملHiérarchie: Interactive Visualization for Hierarchical Topic Models
Existing algorithms for understanding large collections of documents often produce output that is nearly as difficult and time consuming to interpret as reading each of the documents themselves. Topic modeling is a text understanding algorithm that discovers the “topics” or themes within a collection of documents. Tools based on topic modeling become increasingly complex as the number of topics...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003